Speech recognition of mandarin syllables using both linear predict coding cepstra and Mel frequency cepstra
نویسندگان
چکیده
This paper is to compare two most common features representing a speech word for speech recognition on the basis of accuracy, computation time, complexity and cost. The two features to represent a speech word are the linear predict coding cepstra (LPCC) and the Mel-frequency cepstrum coefficient (MFCC). The MFCC was shown to be more accurate than the LPCC in speech recognition using the dynamic time warping method. In this paper, the LPCC gives a recognition rate about 10% higher than the MFCC using the Bayes decision rule for classification and needs much less computational time to be extracted from speech signal waveform, i.e., the MFCC needs computational time 5.5 time as much as the LPCC does. The algorithm to compute a LPCC from a speech signal much simpler than a MFCC, which has many parameters to be adjusted to smooth the spectrum, performing a processing that is similar to be adjusted to smooth the spectrum, performing a processing that is similar to that executed by the human ear, but the LPCC is easily obtained by the least squares method using a set of recursive formula.
منابع مشابه
One-Sample Speech Recognition of Mandarin Monosyllables using Unsupervised Learning
In the speech recognition, a mandarin syllable wave is compressed into a matrix of linear predict coding cepstra (LPCC), i.e., a matrix of LPCC represents a mandarin syllable. We use the Bayes decision rule on the matrix to identify a mandarin syllable. Suppose that there are K different mandarin syllables, i.e., K classes. In the pattern classification problem, it is known that the Bayes decis...
متن کاملA simple statistical speech recognition of mandarin monosyllables
Each mandarin syllable is represented by a sequence of vectors of linear predict coding cepstra (LPCC). Since all syllables have a simple phonetic structure, in our speech recognition, we partition the sequence of LPCC vectors of all syllables into equal segments and average the LPCC vectors in each segment. The mean vector of LPCC is used as the feature of a syllable. Our simple feature does n...
متن کاملAn efficient mel-LPC analysis method for speech recognition
This paper proposes a simple and e cient time domain technique to estimate an all-poll model on a mel-frequency axis (Mel-LPC). This method requires only two-fold computational cost as compared to conventional linear prediction analysis. The recognition performance of mel-cepstral parameters obtained by the Mel LPC analysis is compared with those of conventional LP mel-cepstra and the melfreque...
متن کاملMaximum likelihood sub-band adaptation for robust speech recognition
Noise-robust speech recognition has become an important area of research in recent years. In current speech recognition systems, the Mel-frequency cepstrum coefficients (MFCCs) are used as recognition features. When the speech signal is corrupted by narrow-band noise, the entire MFCC feature vector gets corrupted and it is not possible to exploit the frequency-selective property of the noise si...
متن کاملRobust features for speech recognition systems
In this paper we propose a set of features based on group delay spectrum for speech recognition systems. These features appear to be more robust to channel variations and environmental changes compared to features based on Melspectral coefficients. The main idea is to derive cepstrumlike features from group delay spectrum instead of deriving them from power spectrum. The group delay spectrum is...
متن کامل